Methods and systems for determining a pregnancy-related state of a subject

ABSTRACT

The invention generally relates to methods for assessing the health of a tissue by characterizing circulating nucleic acids in a biological sample. According to certain embodiments, methods for assessing the health of a tissue include the steps of detecting a sample level of RNA in a biological sample, comparing the sample level of RNA to a reference level of RNA specific to the tissue, determining whether a difference exists between the sample level and the reference level, and characterizing the tissue as abnormal if a difference is detected.

CROSS REFERENCE

This application claims benefit and is a Continuation of applicationSer. No. 17/497,358, filed Oct. 8, 2021, which is a Continuation ofapplication Ser. No. 17/187,298, filed Feb. 26, 2021, which is aContinuation of application Ser. No. 16/373,996, filed Apr. 3, 2019, nowabandon, which is a Continuation of application Ser. No. 15/377,894filed Dec. 13, 2016, now U.S. Pat. No. 10,287,632 issued May 14, 2019,which is a Continuation of application Ser. No. 14/861,650 filed Sep.22, 2015, now U.S. Pat. No. 10,240,200 issued Mar. 26, 2019, which is aDivisional of application Ser. No. 13/752,131 filed Jan. 28, 2013, nowabandon, which claims benefit of U.S. Provisional Patent Application No.61/591,642, filed on Jan. 27, 2012, which applications are incorporatedherein by reference in their entirety.

TECHNICAL FIELD

The present invention relates the field of nucleic acid analysis from abiological sample containing genetic material. Specifically, methods ofthe invention relate to quantitating tissue-specific nucleic acids in abiological sample.

BACKGROUND

It is often challenging to gauge the health of organs within anindividual's body. Physicians are often forced to use expensive imagingtechniques or perform invasive biopsies for cancer screening to identifydiagnostic biomarkers and monitor tumor initiation and progression. Theinvasive nature of biopsies makes them unsuitable for widespreadscreening of patients. In addition, many diagnostic biomarkers are onlyidentified in cancer cell lines or from biopsy specimens obtained frompatients with late-stage disease and metastasis.

The presence of circulating nucleic acids (DNA and RNA) detectable inthe plasma and serum of cancer patients has been investigated for itspotential use to serve as markers for diagnostic purposes, with theobvious benefit being a non-invasive diagnostic tool. It has been shownthat markers within the plasma are identical to the ones found in thecarcinogenic tissue of the patient. Circulating RNA is particularly ofinterest for use in early detection cancer screenings due to RNA markersclose association with malignancy.

In addition to cancer detection, the discovery of fetal specificcell-free RNA present in maternal plasma has opened up new horizons onprenatal molecular diagnostics (see e.g., Poon et al., ClinicalChemistry, 46(11): 1832-1834 (2000)). Specifically, analysis of plasmaRNA holds promise for noninvasive gene expression profiling of thefetus. However, only a handful of pregnancy specific cell-free RNAtranscripts have been characterized to date. A comprehensive profilingof such RNA has not been performed.

A problem with analyzing cell-free RNA in non-maternal and maternalblood is the lack of suitable data to estimate the biological causes ofthe cell-free RNA present. For example, there lacks a reliable methodfor determining tissue origins of the cell-free RNA present in blood.

SUMMARY

The present invention provides methods for profiling the origin of thecell-free RNA to assess the health of an organ or tissue. Deviations innormal cell-free transcriptomes are caused when organ/tissue-specifictranscripts are released in to the blood in large amounts as thoseorgans/tissue begin to fail or are attacked by the immune system orpathogens. As a result inflammation process can occur as part of body'scomplex biological response to these harmful stimuli. The invention,according to certain aspects, utilizes tissue-specific RNA transcriptsof healthy individuals to deduce the relative optimal contributions ofdifferent tissues in the normal cell-free transcriptome, with eachtissue-specific RNA transcript of the sample being indicative of theapotopic rate of that tissue. The normal cell-free transcriptome servesas a baseline or reference level to assess tissue health of otherindividuals. The invention includes a comparative measurement of thecell-free transcriptome of a sample to the normal cell freetranscriptome to assess the sample levels of tissue-specific transcriptscirculating in plasma and to assess the health of tissues contributingto the cell-free transcriptome.

In addition to normal reference levels, methods of the invention alsoutilize reference levels for cell-free transcriptomes specific to otherpatient populations. Using methods of the invention one can determinethe relative contribution of tissue-specific transcripts to thecell-free transcriptome of maternal subjects, fetus subjects, and/orsubjects having a condition or disease.

By analyzing the health of tissue based on tissue-specific transcripts,methods of the invention advantageously allow one to assess the healthof a tissue without relying on disease-related protein biomarkers. Incertain aspects, methods of the invention assess the health of a tissueby comparing a sample level of RNA in a biological sample to a referencelevel of RNA specific to a tissue, determining whether a differenceexists between the sample level and the reference level, andcharacterizing the tissue as abnormal if a difference is detected. Forexample, if a patient's RNA expression levels for a specific tissuediffers from the RNA expression levels for the specific tissue in thenormal cell-free transcriptome, this indicates that patient's tissue isnot functioning properly.

In certain aspects, methods of the invention involve assessing health ofa tissue by characterizing the tissue as abnormal if a specified levelof RNA is present in the blood. The method may further include detectinga level of RNA in a blood sample, comparing the sample level of RNA to areference level of RNA specific to a tissue, determining whether adifference exists between the sample level and the reference level, andcharacterizing the tissue as abnormal if the sample level and thereference level are the same.

The present invention also provides methods for comprehensivelyprofiling fetal specific cell-free RNA in maternal plasma anddeconvoluting the cell-free transcriptome of fetal origin with relativeproportion to different fetal tissue types. Methods of the inventioninvolve the use of next-generation sequencing technology and/ormicroarrays to characterize the cell-free RNA transcripts that arepresent in maternal plasma at different stages of pregnancy.Quantification of these transcripts allows one to deduce changes ofthese genes across different trimesters, and hence provides a way ofquantification of temporal changes in transcripts.

Methods of the invention allow diagnosis and identification of thepotential for complications during or after pregnancy. Methods alsoallow the identification of pregnancy-associated transcripts which, inturn, elucidates maternal and fetal developmental programs. Methods ofthe invention are useful for preterm diagnosis as well as elucidation oftranscript profiles associated with fetal developmental pathwaysgenerally. Thus, methods of the invention are useful to characterizefetal development and are not limited to characterization only ofdisease states or complications associated with pregnancy. Exemplaryembodiments of the methods are described in the detailed description,claims, and figures provided below.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is best understood from the following detailed descriptionwhen read in conjunction with the accompanying drawings. It isemphasized that, according to common practice, the various features ofthe drawings are not to-scale. On the contrary, the dimensions of thevarious features are arbitrarily expanded or reduced for clarity.Included in the drawings are the following figures.

FIG. 1 depicts a listing of the top detected female pregnancy associateddifferentially expressed transcripts.

FIG. 2 shows plots of the two main principal components for cell freeRNA transcript levels obtained in Example 1.

FIG. 3A-3B depicts a heatmap of the top 100 cell free transcript levelsexhibiting different temporal levels in preterm and normal pregnancyusing microarrays.

FIG. 3C-3D depicts heatmap of the top 100 cell free transcript levelsexhibiting different temporal levels in preterm and normal pregnancyusing RNA-Seq.

FIG. 4 depicts a ranking of the top 20 transcripts differentiallyexpressed between pre-term and normal pregnancy.

FIG. 5 depicts results of a Gene Ontology analysis on the top 20 commonRNA transcripts of FIG. 4 , showing those transcripts enriched forproteins that are attached (integrated or loosely bound) to the plasmamembrane or on the membranes of the platelets.

FIG. 6 depicts that the gene expression profile for PVALB across thedifferent trimesters shows the premature births [highlighted in blue]has higher levels of cell free RNA transcripts found as compared tonormal pregnancy.

FIG. 7A-7B outlines exemplary process steps for determining the relativetissue contributions to a cell-free transcriptome of a sample.

FIG. 8A-8B depicts the panel of selected fetal tissue-specifictranscripts generated in Example 2.

FIG. 9A-9B depict the raw data of parallel quantification of the fetaltissue-specific transcripts showing changes across maternal time-points(first trimester, second trimester, third trimester, and post partum)using the actual cell free RNA as well as the cDNA library of the samecell free RNA.

FIG. 10A-10B illustrates relative expression of placental genes acrossmaternal time points (first trimester, second trimester, thirdtrimester, and post partum).

FIG. 11A-11B illustrates relative expression of fetal brain genes acrossmaternal time points (first trimester, second trimester, thirdtrimester, and post partum).

FIG. 12A-12B illustrates relative expression of fetal liver genes acrossmaternal time points (first trimester, second trimester, thirdtrimester, and post partum).

FIG. 13 illustrates the relative composition of different organscontribution towards a plasma adult cell free transcriptome.

FIG. 14 illustrates a decomposition of decomposition of organcontribution towards a plasma adult cell free transcriptome usingRNA-seq data.

FIG. 15A-15D depicts a panel of 94 tissue-specific genes in Example 3that were verified with qPCR.

FIG. 16 shows a heat map of the tissue specific transcripts of FIG. 15 ,being detectable in the cell free RNA.

FIG. 17 depicts a flow-diagram of this method according to certainembodiments.

FIG. 18A-18AAAAAV depicts a list of tissue-specific genes for Example 3that was obtained using raw data from the Human U133A/GNF1H Gene Atlasand RNA-Seq Atlas databases.

DETAILED DESCRIPTION

Methods and materials described herein apply a combination ofnext-generation sequencing and microarray techniques for detecting,quantitating and characterizing RNA sequences present in a biologicalsample. In certain embodiments, the biological sample contains a mixtureof genetic material from different genomic sources, i.e. pregnant femaleand a fetus.

Unlike other methods of digital analysis in which the nucleic acid inthe sample is isolated to a nominal single target molecule in a smallreaction volume, methods of the present invention are conducted withoutdiluting or distributing the genetic material in the sample. Methods ofthe invention allow for simultaneous screening of multipletranscriptomes, and provide informative sequence information for eachtranscript at the single-nucleotide level, thus providing the capabilityfor non-invasive, high throughput screening for a broad spectrum ofdiseases or conditions in a subject from a limited amount of biologicalsample.

In one particular embodiment, methods of the invention involve analysisof mixed fetal and maternal RNA in the maternal blood to identifydifferentially expressed transcripts throughout different stages ofpregnancy that may be indicative of a preterm or pathological pregnancy.Differential detection of transcripts is achieved, in part, by isolatingand amplifying plasma RNA from the maternal blood throughout thedifferent stages of pregnancy, and quantitating and characterizing theisolated transcripts via microarray and RNA-Seq.

Methods and materials specific for analyzing a biological samplecontaining RNA (including non-maternal, maternal, maternal-fetus mixed)as described herein, are merely one example of how methods of theinvention can be applied and are not intended to limit the invention.Methods of the invention are also useful to screen for the differentialexpression of target genes related to cancer diagnosis, progressionand/or prognosis using cell-free RNA in blood, stool, sputum, urine,transvaginal fluid, breast nipple aspirate, cerebrospinal fluid, etc.

In certain embodiments, methods of the invention generally include thefollowing steps: obtaining a biological sample containing geneticmaterial from different genomic sources, isolating total RNA from thebiological sample containing biological sample containing a mizture ofgenetic material from different genomic sources, preparing amplifiedcDNA from total RNA, sequencing amplified cDNA, and digital counting andanalysis, and profiling the amplified cDNA.

Methods of the invention also involve assessing the health of a tissuecontributing to the cell-free transcriptome. In certain embodiments, theinvention involves assessing the cell-free transcriptome of a biologicalsample to determine tissue-specific contributions of individual tissuesto the cell-free transcriptome. According to certain aspects, theinvention assesses the health of a tissue by detecting a sample level ofRNA in a biological sample, comparing the sample level of RNA to areference level of RNA specific to the tissue, and characterizing thetissue as abnormal if a difference is detected. This method isapplicable to characterize the health of a tissue in non-maternalsubjects, pregnant subjects, and live fetuses. FIG. 17 depicts aflow-diagram of this method according to certain embodiments.

In certain aspects, methods of the invention employ a deconvolution of areference cell-free RNA transcriptome to determine a reference level fora tissue. Preferably, the reference cell-free RNA transcriptome is anormal, healthy transcriptome, and the reference level of a tissue is arelative level of RNA specific to the tissue present in the blood ofhealthy, normal individuals. Methods of the invention assume thatapoptotic cells from different tissue types release their RNA intoplasma of a subject. Each of these tissues expresses a specific numberof genes unique to the tissue type, and the cell-free RNA transcriptomeof a subject is a summation of the different tissue types. Each tissuemay express one or more numbers of genes. In certain embodiments, thereference level is a level associated with one of the genes expressed bya certain tissue. In other embodiments, the reference level is a levelassociated with a plurality of genes expressed by a certain tissue. Itshould be noted that a reference level or threshold amount for atissue-specific transcript present in circulating RNA may be zero or apositive number.

For healthy, normal subjects, the relative contributions of circulatingRNA from different tissue types are relatively stable, and eachtissue-specific RNA transcript of the cell-free RNA transcriptome fornormal subjects can serve as a reference level for that tissue. Applyingmethods of the invention, a tissue is characterized as unhealthy orabnormal if a sample includes a level of RNA that differs from areference level of RNA specific to the tissue. The tissue of the samplemay be characterized as unhealthy if the actual level of RNA isstatistically different from the reference level. Statisticalsignificance can be determined by any method known in the art. Thesemeasurements can be used to screen for organ health, as diagnostic tool,and as a tool to measure response to pharmaceuticals or in clinicaltrials to monitor health.

If a difference is detected between the sample level of RNA and thereference level of RNA, such difference suggests that the associatedtissue is not functioning properly. The change in circulating RNA may bethe precursor to organ failure or indicate that the tissue is beingattacked by the immune system or pathogens. If a tissue is identified asabnormal, the next step(s), according to certain embodiments, mayinclude more extensive testing of the tissue (e.g. invasive biopsy ofthe tissue), prescribing course of treatment specific to the tissue,and/or routine monitoring of the tissue.

Methods of the invention can be used to infer organ healthnon-invasively. This non-invasive testing can be used to screen forappendicitis, incipient diabetes and pathological conditions induced bydiabetes such as nephropathy, neuropathy, retinopathy etc. In addition,the invention can be used to determine the presence of graft versus hostdisease in organ transplants, particularly in bone marrow transplantrecipients whose new immune system is attacking the skin, GI tract orliver. The invention can also be used to monitor the health of solidorgan transplant recipients such as heart, lung and kidney. The methodsof the invention can assess likelihood of prematurity, preeclampsia andanomalies in pregnancy and fetal development. In addition, methods ofthe invention could be used to identify and monitor neurologicaldisorders (e.g. multiple sclerosis and Alzheimer's disease) that involvecell specific death (e.g. of neurons or due to demyelination) or thatinvolve the generation of plaques or protein aggregation.

A cell-free transcriptome for purposes of determining a reference levelfor tissue-specific transcripts can be the cell-free transcriptome ofone or more normal subjects, maternal subjects, subjects having acertain conditions and diseases, or fetus subjects. In the case ofcertain conditions, the reference level of a tissue is a level of RNAspecific to the tissue present in blood of one or more subjects having acertain disease or condition. In such aspect, the method includesdetecting a level of RNA in a blood, comparing the sample level of RNAto a reference level of RNA specific to a tissue, determining whether adifference exists between the sample level and the reference level, andcharacterizing the as abnormal if the sample level and the referencelevel are the same.

A deconvolution of a cell-free transcriptome is used to determine therelative contribution of each tissue type towards the cell-free RNAtranscriptome. The following steps are employed to determine therelative RNA contributions of certain tissues in a sample. First, apanel of tissue-specific transcripts is identified. Second, total RNA inplasma from a sample is determined using methods known in the art.Third, the total RNA is assessed against the panel of tissue-specifictranscripts, and the total RNA is considered a summation these differenttissue-specific transcripts. Quadratic programming can be used as aconstrained optimization method to deduce the relative optimalcontributions of different organs/tissues towards the cell-freetranscriptome of the sample.

One or more databases of genetic information can be used to identify apanel of tissue-specific transcripts. Accordingly, aspects of theinvention provide systems and methods for the use and development of adatabase. Particularly, methods of the invention utilize databasescontaining existing data generated across tissue types to identify thetissue-specific genes. Databases utilized for identification oftissue-specific genes include the Human 133A/GNF1H Gene Atlas andRNA-Seq Atlas, although any other database or literature can be used. Inorder to identify tissue-specific transcripts from one or moredatabases, certain embodiments employ a template-matching algorithm tothe databases. Template matching algorithms used to filter data areknown in the art, see e.g., Pavlidis P, Noble W S (2001) Analysis ofstrain and regional variation in gene expression in mouse brain. GenomeBiol 2:research0042.1-0042.15.

In certain embodiments, quadratic programming is used as a constrainedoptimization method to deduce relative optimal contributions ofdifferent organs/tissues towards the cell-free transcriptome in asample. Quadratic programming is known in the art and described indetail in Goldfarb and A. Idnani (1982). Dual and Primal-Dual Methodsfor Solving Strictly Convex Quadratic Programs. In J. P. Hennart (ed.),Numerical Analysis, Springer-Verlag, Berlin, pages 226-239, and D.Goldfarb and A. Idnani (1983). A numerically stable dual method forsolving strictly convex quadratic programs. Mathematical Programming,27, 1-33.

FIG. 7 outlines exemplary process steps for determining the relativetissue contributions to a cell-free transcriptome of a sample. Usinginformation provided by one or more tissue-specific databases, a panelof tissue-specific genes is generated with a template-matching function.A quality control function can be applied to filter the results. A bloodsample is then analyzed to determine the relative contribution of eachtissue-specific transcript to the total RNA of the sample. Cell-free RNAis extracted from the sample, and the cell-free RNA extractions areprocessed using one or more quantification techniques (e.g. standardmircoarrays and RNA-sequence protocols). The obtained gene expressionvalues for the sample are then normalized. This involves rescaling ofall gene expression values to the housekeeping genes. Next, the sample'stotal RNA is assessed against the panel of tissue-specific genes usingquadratic programming in order to determine the tissue-specific relativecontributions to the sample's cell-free transcriptome. The followingconstraints are employed to obtain the estimated relative contributionsduring the quadratic programming analysis: a) the RNA contributions ofdifferent tissues are greater than or equal to zero, and b) the sum ofall contributions to the cell-free transcriptome equals one.

Method of the invention for determining the relative contributions foreach tissue can be used to determine the reference level for the tissue.That is, a certain population of subjects (e.g., maternal, normal, andcancerous) can be subject to the deconvolution process outlined in FIG.7 to obtain reference levels of tissue-specific gene expression for thatpatient population. When relative tissue contributions are consideredindividually, quantification of each of these tissue-specifictranscripts can be used as a measure for the reference apoptotic rate ofthat particular tissue for that particular population. For example,blood from one or more healthy, normal individuals can be analyzed todetermine the relative RNA contribution of tissues to the cell-free RNAtranscriptome for healthy, normal individuals. Each relative RNAcontribution of tissue that makes up the normal RNA transcriptome is areference level for that tissue.

According to certain embodiments, an unknown sample of blood can besubject to process outlined in FIG. 7 to determine the relative tissuecontributions to the cell-free RNA transcriptome of that sample. Therelative tissue contributions of the sample are then compared to one ormore reference levels of the relative contributions to a referencecell-free RNA transcriptome. If a specific tissue shows a contributionto the cell-free RNA transcriptome in the sample that is greater or lessthan the contribution of the specific tissue in reference cell-free RNAtranscriptome, then the tissue exhibiting differential contribution maybe characterized accordingly. If the reference cell-free transcriptomerepresents a healthy population, a tissue exhibiting a differential RNAcontribution in a sample cell-free transcriptome can be classified asunhealthy.

The biological sample can be blood, saliva, sputum, urine, semen,transvaginal fluid, cerebrospinal fluid, sweat, breast milk, breastfluid (e.g., breast nipple aspirate), stool, a cell or a tissue biopsy.In certain embodiments, the samples of the same biological sample areobtained at multiple different time points in order to analyzedifferential transcript levels in the biological sample over time. Forexample, maternal plasma may be analyzed in each trimester. In someembodiments, the biological sample is drawn blood and circulatingnucleic acids, such as cell-free RNA. The cell-free RNA may be fromdifferent genomic sources is found in the blood or plasma, rather thanin cells.

In a particular embodiment, the drawn blood is maternal blood. In orderto obtain a sufficient amount of nucleic acids for testing, it ispreferred that approximately 10-50 mL of blood be drawn. However, lessblood may be drawn for a genetic screen in which less statisticalsignificance is required, or in which the RNA sample is enriched forfetal RNA.

Methods of the invention involve isolating total RNA from a biologicalsample. Total RNA can be isolated from the biological sample using anymethods known in the art. In certain embodiments, total RNA is extractedfrom plasma. Plasma RNA extraction is described in Enders et al., “TheConcentration of Circulating Corticotropin-releasing Hormone mRNA inMaternal Plasma Is Increased in Preeclampsia,” Clinical Chemistry 49:727-731, 2003. As described there, plasma harvested after centrifugationsteps is mixed Trizol LS reagent (Invitrogen) and chloroform. Themixture is centrifuged, and the aqueous layer transferred to new tubes.Ethanol is added to the aqueous layer. The mixture is then applied to anRNeasy mini column (Qiagen) and processed according to themanufacturer's recommendations.

In the embodiments where the biological sample is maternal blood, thematernal blood may optionally be processed to enrich the fetal RNAconcentration in the total RNA. For example, after extraction, the RNAcan be separated by gel electrophoresis and the gel fraction containingcirculatory RNA with a size of corresponding to fetal RNA (e.g., <300bp) is carefully excised. The RNA is extracted from this gel slice andeluted using methods known in the art.

Alternatively, fetal specific RNA may be concentrated by known methods,including centrifugation and various enzyme inhibitors. The RNA is boundto a selective membrane (e.g., silica) to separate it from contaminants.The RNA is preferably enriched for fragments circulating in the plasma,which are less than less 300 bp. This size selection is done on an RNAsize separation medium, such as an electrophoretic gel or chromatographymaterial.

Flow cytometry techniques can also be used to enrich for fetal cells inmaternal blood (Herzenberg et al., PNAS 76: 1453-1455 (1979); Bianchi etal., PNAS 87: 3279-3283 (1990); Bruch et al., Prenatal Diagnosis 11:787-798 (1991)). U.S. Pat. No. 5,432,054 also describes a technique forseparation of fetal nucleated red blood cells, using a tube having awide top and a narrow, capillary bottom made of polyethylene.Centrifugation using a variable speed program results in a stacking ofred blood cells in the capillary based on the density of the molecules.The density fraction containing low-density red blood cells, includingfetal red blood cells, is recovered and then differentially hemolyzed topreferentially destroy maternal red blood cells. A density gradient in ahypertonic medium is used to separate red blood cells, now enriched inthe fetal red blood cells from lymphocytes and ruptured maternal cells.The use of a hypertonic solution shrinks the red blood cells, whichincreases their density, and facilitates purification from the moredense lymphocytes. After the fetal cells have been isolated, fetal RNAcan be purified using standard techniques in the art.

Further, an agent that stabilizes cell membranes may be added to thematernal blood to reduce maternal cell lysis including but not limitedto aldehydes, urea formaldehyde, phenol formaldehyde, DMAE(dimethylaminoethanol), cholesterol, cholesterol derivatives, highconcentrations of magnesium, vitamin E, and vitamin E derivatives,calcium, calcium gluconate, taurine, niacin, hydroxylamine derivatives,bimoclomol, sucrose, astaxanthin, glucose, amitriptyline, isomer Ahopane tetral phenylacetate, isomer B hopane tetral phenylacetate,citicoline, inositol, vitamin B, vitamin B complex, cholesterolhemisuccinate, sorbitol, calcium, coenzyme Q, ubiquinone, vitamin K,vitamin K complex, menaquinone, zonegran, zinc, Ginkgo biloba extract,diphenylhydantoin, perftoran, polyvinylpyrrolidone, phosphatidylserine,tegretol, PABA, disodium cromglycate, nedocromil sodium, phenyloin, zinccitrate, mexitil, dilantin, sodium hyaluronate, or polaxamer 188.

An example of a protocol for using this agent is as follows: The bloodis stored at 4° C. until processing. The tubes are spun at 1000 rpm forten minutes in a centrifuge with braking power set at zero. The tubesare spun a second time at 1000 rpm for ten minutes. The supernatant (theplasma) of each sample is transferred to a new tube and spun at 3000 rpmfor ten minutes with the brake set at zero. The supernatant istransferred to a new tube and stored at −80° C. Approximately twomilliliters of the “buffy coat,” which contains maternal cells, isplaced into a separate tube and stored at −80° C.

Methods of the invention also involve preparing amplified cDNA fromtotal RNA. cDNA is prepared and indiscriminately amplified withoutdiluting the isolated RNA sample or distributing the mixture of geneticmaterial in the isolated RNA into discrete reaction samples. Preferably,amplification is initiated at the 3′ end as well as randomly throughoutthe whole transcriptome in the sample to allow for amplification of bothmRNA and non-polyadenylated transcripts. The double-stranded cDNAamplification products are thus optimized for the generation ofsequencing libraries for Next Generation Sequencing platforms. Suitablekits for amplifying cDNA in accordance with the methods of the inventioninclude, for example, the Ovation® RNA-Seq System.

Methods of the invention also involve sequencing the amplified cDNA.While any known sequencing method can be used to sequence the amplifiedcDNA mixture, single molecule sequencing methods are preferred.Preferably, the amplified cDNA is sequenced by whole transcriptomeshotgun sequencing (also referred to herein as (“RNA-Seq”). Wholetranscriptome shotgun sequencing (RNA-Seq) can be accomplished using avariety of next-generation sequencing platforms such as the IlluminaGenome Analyzer platform, ABI Solid Sequencing platform, or LifeScience's 454 Sequencing platform.

Methods of the invention further involve subjecting the cDNA to digitalcounting and analysis. The number of amplified sequences for eachtranscript in the amplified sample can be quantitated via sequence reads(one read per amplified strand). Unlike previous methods of digitalanalysis, sequencing allows for the detection and quantitation at thesingle nucleotide level for each transcript present in a biologicalsample containing a genetic material from different genomic sources andtherefore multiple transcriptomes.

After digital counting, the ratios of the various amplified transcriptscan compared to determine relative amounts of differential transcript inthe biological sample. Where multiple biological samples are obtained atdifferent time-points, the differential transcript levels can becharacterized over the course of time.

Differential transcript levels within the biological sample can also beanalyzed using via microarray techniques. The amplified cDNA can be usedto probe a microarray containing gene transcripts associated with one orconditions or diseases, such as any prenatal condition, or any type ofcancer, inflammatory, or autoimmune disease.

It will be understood that methods and any flow diagrams disclosedherein can be implemented by computer program instructions. Theseprogram instructions may be provided to a computer processor, such thatthe instructions, which execute on the processor, create means forimplementing the actions specified in the flowchart blocks or describedin methods for assessing tissue disclosed herein. The computer programinstructions may be executed by a processor to cause a series ofoperational steps to be performed by the processor to produce a computerimplemented process. The computer program instructions may also cause atleast some of the operational steps to be performed in parallel.Moreover, some of the steps may also be performed across more than oneprocessor, such as might arise in a multi-processor computer system. Inaddition, one or more processes may also be performed concurrently withother processes or even in a different sequence than illustrated withoutdeparting from the scope or spirit of the invention.

The computer program instructions can be, stored on any suitablecomputer-readable medium including, but not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CD-ROM, digitalversatile disks (DVD) or other optical storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other medium which can be used to store the desired informationand which can be accessed by a computing device.

EXAMPLES Example 1: Profiling Maternal Plasma Cell-Free RNA by RNASequencing-A Comprehensive Approach Overview:

The plasma RNA profiles of 5 pregnant women were collected during thefirst trimester, second trimester, post-partum, as well as those of 2non-pregnant female donors and 2 male donors using both microarray andRNA-Seq.

Among these pregnancies, there were 2 pregnancies with clinicalcomplications such as premature birth and one pregnancy with bi-lobedplacenta. Comparison of these pregnancies against normal cases revealsgenes that exhibit significantly different gene expression patternacross different temporal stages of pregnancy. Application of suchtechnique to samples associated with complicated pregnancies may helpidentify transcripts that can be used as molecular markers that arepredictive of these pathologies.

Study Design and Methods: Subjects

Samples were collected from 5 pregnant women were during the firsttrimester, second trimester, third trimester, and post-partum. As acontrol, blood plasma samples were also collected from 2 non-pregnantfemale donors and 2 male donors.

Blood Collection and Processing

Blood samples were collected in EDTA tube and centrifuged at 1600 g for10 min at 4° C. Supernatant were placed in 1 ml aliquots in a 1.5 mlmicrocentrifuge tube which were then centrifuged at 16000 g for 10 minat 4° C. to remove residual cells. Supernatants were then stored in 1.5ml microcentrifuge tubes at −80° C. until use.

RNA Extraction and Amplification

The cell-free maternal plasma RNAs was extracted by Trizol LS reagent.The extracted and purified total RNA was converted to cDNA and amplifiedusing the RNA-Seq Ovation Kit (NuGen). (The above steps were the samefor both Microarray and RNA-Seq sample preparation).

The cDNA was fragmented using DNase I and labeled with Biotin, followingby hybridization to Affymetrix GeneChip ST 1.0 microarrays. The Illuminasequencing platform and standard Illumina library preparation protocolswere used for sequencing.

Data Analysis: Correlation Between Microarray and RNA-Seq

The RMA algorithm was applied to process the raw microarray data forbackground correction and normalization. RPKM values of the sequencedtranscripts were obtained using the CASAVA 1.7 pipeline for RNA-seq. TheRPKM in the RNA-Seq and the probe intensities in the microarray wereconverted to log 2 scale. For the RNA-Seq data, to avoid taking the logof 0, the gene expressions with RPKM of 0 were set to 0.01 prior totaking logs. Correlation coefficients between these two platforms rangeswere then calculated.

Differential Expression of RNA Transcripts Levels Using RNA-Seq

Differential gene expression analysis was performed using edgeR, a setof library functions which are specifically written to analyze digitalgene expression data. Gene Ontology was then performed using DAVID toidentify for significantly enriched GO terms.

Principle Component Analysis & Identification of Significant TimeVarying Genes

Principle component analysis was carried out using a custom script in R.To identify time varying genes, the time course library of functions inR were used to implement empirical Bayes methods for assessingdifferential expression in experiments involving time course which inour case are the different trimesters and post-partum for eachindividual patients.

Results and Discussion

RNA-Seq reveals that pregnancy-associated transcripts are detected atsignificantly different levels between pregnant and non pregnantsubjects.

A comparison of the transcripts level derived using RNA-Seq and GeneOntology Analysis between pregnant and non-pregnant subjects revealedthat transcripts exhibiting differential transcript levels aresignificantly associated with female pregnancy, suggesting that RNA-Seqare enabling observation of real differences between these two class oftranscriptome due to pregnancy. The top rank significantly expressedgene is PLAC4 which has also been known as a target in previous studiesfor developing RNA based test for trisomy 21. A listing of the topdetected female pregnancy associated differentially expressedtranscripts is shown in FIG. 1 .

Principle Component Analysis (PCA) on Plasma Cell Free RNA TranscriptsLevels in Maternal Plasma Distinguishes Between Pre-Mature and NormalPregnancy

Using the plasma cell free transcript level profiles as inputs forPrinciple Component Analysis, the profile from each patient at differenttime points clustered into different pathological clusters suggestingthat cell free plasma RNA transcript profile in maternal plasma may beused to distinguish between pre-term and non-preterm pregnancy.

Plasma Cell free RNA levels were quantified using both microarray andRNA-Seq. Transcripts expression levels profile from microarray andRNA-Seq from each patient are correlated with a Pearson correlation ofapproximately 0.7. Plots of the two main principal components for cellfree RNA transcript levels is shown in FIG. 2 .

Identification of Cell Free RNA Transcripts in Maternal PlasmaExhibiting Significantly Different Time Varying Trends Between Pre-Termand Normal Pregnancy Across all Three Trimesters and Post Partum

A heatmap of the top 100 cell free transcript levels exhibitingdifferent temporal levels in preterm and normal pregnancy usingmicroarrays is shown in FIG. 3A. A heatmap of the top 100 cell freetranscript levels exhibiting different temporal levels in preterm andnormal pregnancy using RNA-Seq is shown in FIG. 3B.

Common Cell Free RNA Transcripts Identified by Microarray and RNA-Seqwhich Exhibit Significantly Different Time Varying Trends BetweenPre-Term and Normal Pregnancy Across all Three Trimesters andPost-Partum

A ranking of the top 20 transcripts differentially expressed betweenpre-term and normal pregnancy is shown in FIG. 4 . These top 20 commonRNA transcripts were analyzed using Gene Ontology and were shown to beenriched for proteins that are attached (integrated or loosely bound) tothe plasma membrane or on the membranes of the platelets (see FIG. 5 ).

Gene Expression Profiles for PVALB

The protein encoded by PVALB gene is a high affinity calcium ion-bindingprotein that is structurally and functionally similar to calmodulin andtroponin C. The encoded protein is thought to be involved in musclerelaxation. As shown in FIG. 6 , the gene expression profile for PVALBacross the different trimesters shows the premature births [highlightedin blue] has higher levels of cell free RNA transcripts found ascompared to normal pregnancy.

Conclusion:

Results from quantification and characterization of maternal plasmacell-free RNA using RNA-Seq strongly suggest that pregnancy associatedtranscripts can be detected.

Furthermore, both RNA-Seq and microarray methods can detect considerablegene transcripts whose level showed differential time trends that has ahigh probability of being associated with premature births.

The methods described herein can be modified to investigate pregnanciesof different pathological situations and can also be modified toinvestigate temporal changes at more frequent time points.

Example 2: Quantification of Tissue-Specific Cell-Free RNA ExhibitingTemporal Variation During Pregnancy Overview:

Cell-free fetal DNA found in maternal plasma has been exploitedextensively for non-invasive diagnostics. In contrast, cell-free fetalRNA which has been shown to be similarly detected in maternalcirculation has yet been applied widely as a form of diagnostics. Bothfetal cell-free RNA and DNA face similar challenges in distinguishingthe fetal from maternal component because in both cases the maternalcomponent dominates. To detect cell-free RNA of fetal origin, focus canbe placed on genes that are highly expressed only during fetaldevelopment, which are subsequently inferred to be of fetal in originand easily distinguished from background maternal RNA. Such aperspective is collaborated by studies that has established thatcell-free fetal RNA derived from genes that are highly expressed in theplacenta are detectable in maternal plasma during pregnancy.

A significant characteristic that set RNA apart from DNA can beattributed to RNA transcripts dynamic nature which is well reflectedduring fetal development. Life begins as a series of well-orchestratedevents that starts with fertilization to form a single-cell zygote andends with a multi-cellular organism with diverse tissue types. Duringpregnancy, majority of fetal tissues undergoes extensive remodeling andcontain functionally diverse cell types. This underlying diversity canbe generated as a result of differential gene expression from the samenuclear repertoire; where the quantity of RNA transcripts dictate thatdifferent cell types make different amount of proteins, despite theirgenomes being identical. The human genome comprises approximately 30,000genes. Only a small set of genes are being transcribed to RNA within aparticular differentiated cell type. These tissue specific RNAtranscripts have been identified through many studies and databasesinvolving developing fetuses of classical animal models. Combining knownliterature available with high throughput data generated from samplesvia sequencing, the entire collection of RNA transcripts containedwithin maternal plasma can be characterized.

Fetal organ formation during pregnancy depends on successive programs ofgene expression. Temporal regulation of RNA quantity is necessary togenerate this progression of cell differentiation events that accompanyfetal organ genesis. To unravel similar temporal dynamics for cell freeRNA, the expression profile of maternal plasma cell free RNA, especiallythe selected fetal tissue specific panel of genes, as a function acrossall three trimesters during pregnancy and post-partum were analyzed.Leveraging high throughput qPCR and sequencing technologies capabilityfor simultaneous quantification of cell free fetal tissue specific RNAtranscripts, a system level view of the spectrum of RNA transcripts withfetal origins in maternal plasma was obtained. In addition, maternalplasma was analyzed to deconvolute the heterogeneous cell freetranscriptome of fetal origin a relative proportion of the differentfetal tissue types. This approach incorporated physical constraintsregarding the fetal contributions in maternal plasma, specifically thefraction of contribution of each fetal tissues were required to benon-negative and sum to one during all three trimesters of thepregnancy. These constraints on the data set enabled the results to beinterpreted as relative proportions from different fetal organs. Thatis, a panel of previously selected fetal tissue-specific RNA transcriptsexhibiting temporal variation can be used as a foundation for applyingquadratic programing in order to determine the relative tissue-specificRNA contribution in one or more samples.

When considered individually, quantification of each of these fetaltissue specific transcripts within the maternal plasma can be used as ameasure for the apoptotic rate of that particular fetal tissue duringpregnancy. Normal fetal organ development is tightly regulated by celldivision and apoptotic cell death. Developing tissues compete to surviveand proliferate, and organ size is the result of a balance between cellproliferation and death. Due to the close association between aberrantcell death and developmental diseases, therapeutic modulation ofapoptosis has become an area of intense research, but with this comesthe demand for monitoring the apoptosis rate of specific. Quantificationof fetal cell-free RNA transcripts provide such prognostic value,especially in premature births where the incidence of apoptosis invarious organs of these preterm infants has been have been shown tocontribute to neurodevelopmental deficits and cerebral palsy of preterminfants.

Sample Collection and Study Design

Selection of Fetal Tissue Specific Transcript Panel

To detect the presence of these fetal tissue-specific transcripts, alist of known fetal tissue specific genes was prepared from knownliterature and databases. The specificity for fetal tissues wasvalidated by cross referencing between two main databases: TISGeD (Xiao,S.-J., Zhang, C. & Ji, Z.-L. TiSGeD: a Database for Tissue-SpecificGenes. Bioinformatics (Oxford, England) 26, 1273-1275 (2010)) and BioGPS(Wu, C. et al. BioGPS: an extensible and customizable portal forquerying and organizing gene annotation resources. Genome biology 10,R130 (2009); Su, A. I. et al. A gene atlas of the mouse and humanprotein-encoding transcriptomes. Proceedings of the National Academy ofSciences of the United States of America 101, 6062-7 (2004)). Most ofthese selected transcripts are associated with known fetal developmentalprocesses. This list of genes was overlapped with RNA sequencing andmicroarray data to generate the panel of selected fetal tissue-specifictranscripts shown in FIG. 8 .

Subjects

Samples of maternal blood were collected from normal pregnant womenduring the first trimester, second trimester, third trimester, andpost-partum. For positive controls, fetal tissue specific RNA from thevarious fetal tissue types were bought from Agilent. Negative controlsfor the experiments were performed with the entire process with water,as well as with samples that did not undergoes the reverse transcriptionprocess.

Blood Collection and Processing

At each time-point, 7 to 15 mL of peripheral blood was drawn from eachsubject. Blood was centrifuged at 1600 g for 10 mins and transferred tomicrocentrifuge tubes for further centrifugation at 16000 g for 10 minsto remove residual cells. The above steps were carried out within 24hours of the blood draw. Resulting plasma is stored at −80 Celsius forsubsequent RNA extractions.

RNA Extraction

Cell free RNA extractions were carried using Trizol followed by Qiagen'sRNeasy Mini Kit. To ensure that there are no contaminating DNA, DNasedigestion is performed after RNA elution using RNase free DNase fromQiagen. Resulting cell free RNA from the pregnant subjects was thenprocessed using standard microarrays and Illumina RNA-seq protocols.These steps generate the sequencing library that we used to generateRNA-seq data as well as the microarray expression data. The remainingcell free RNA are then used for parallel qPCR.

Parallel qPCR of Selected Transcripts

Accurate quantification of these fetal tissue specific transcripts wascarried out using the Fluidigm BioMark system (See e.g. Spurgeon, S. L.,Jones, R. C. & Ramakrishnan, R. High throughput gene expressionmeasurement with real time PCR in a microfluidic dynamic array. PloS one3, e1662 (2008)). This system allows for simultaneous query of a panelof fetal tissue specific transcripts. Two parallel forms of inquiry wereconducted using different starting source of material. One was using thecDNA library from the Illumina sequencing protocol and the other usesthe eluted RNA directly. Both sources of material were amplified withevagreen primers targeting the genes of interest. Both sources, RNA andcDNA, were preamplified. cDNA is preamplifed using evagreen PCR supermixand primers. RNA source is preamplified using the CellsDirect One-StepqRT-PCR kit from Invitrogen. Modifications were made to the defaultOne-Step qRT-PCR protocol to accomodate a longer incubation time forreverse transcription. 19 cycles of preamplfication were conducted forboth sources and the collected PCR products were cleaned up usingExonuclease I Treatment. To increase the dynamic range and the abilityto quantify the efficiency of the later qPCR steps, serial dilutionswere performed on the PCR products from 5 fold, 10 fold and 10 folddilutions. Each of the collected maternal plasma from individualpregnant women across the time points went through the same proceduresand was loaded onto 48×48 Dynamic Array Chips from Fluidigm to performthe qPCR. For positive control, fetal tissue specific RNA from thevarious fetal tissue types were bought from Agilent. Each of these RNAfrom fetal tissues went through the same preamplification and clean-upsteps. A pool sample with equal proportions of different fetal tissueswas created as well for later analysis to deconvolute the relativecontribution of each tissue type in the pooled samples. All collecteddata from the Fluidigm BioMark system were pre-processed using FluidigmReal Time PCR Analysis software to obtain the respective Ct values foreach of the transcript across all samples. Negative controls of theexperiments were performed with the entire process with water, as wellas with samples that did not undergoes the reverse transcriptionprocess.

Data Analysis:

Fetal tissue specific RNA transcripts clear from the maternal peripheralbloodstream within a short period after birth. That is, the post-partumcell-free RNA transcriptome of maternal blood lacks fetal tissuespecific RNA transcripts. As a result, it is expected that the quantityof these fetal tissue-specific transcripts to be higher before thanafter birth. The data of interest were the relative quantitative changesof the tissue specific transcripts across all three trimesters ofpregnancy as compared to this baseline level after the baby is born. Asdescribed the methods, the fetal tissue-specific transcripts werequantified in parallel both using the actual cell-free RNA as well asthe cDNA library of the same cell-free RNA. An example of the raw dataobtained is shown in FIGS. 9A and 9B. The qPCR system gave a betterquality readout using the cell-free RNA as the initial source. Focusingon the qPCR results from the direct cell-free RNA source, the analysiswas conducted by comparing the fold changes level of each of these fetaltissue specific transcripts across all three trimesters using thepost-partum level as the baseline for comparison. The Delta-Delta Ctmethod was employed (Schmittgen, T. D. & Livak, K. J. Analyzingreal-time PCR data by the comparative CT method. Nature Protocols 3,1101-1108 (2008)). Each of the transcript expression level was comparedto the housekeeping genes to get the delta Ct value. Subsequently, tocompare each trimesters to after birth, the delta-delta Ct method wasapplied using the post-partum data as the baseline.

Results and Discussion:

As shown in FIGS. 10, 11, and 12 , the tissue-specific transcripts aregenerally found to be at a higher level during the trimesters ascompared to after-birth. In particular, the tissue-specific panel ofplacental, fetal brain and fetal liver specific transcripts showed thesame bias, where these transcripts are typically found to exist athigher levels during pregnancy then compared to after birth. Between thedifferent trimesters, a general trend showed that the quantity of thesetranscripts increase with the progression into pregnancy.

Biological Significance of Quantified Fetal Tissue-Specific RNA: Most ofthe transcripts in the panel were involved in fetal organ developmentand many are also found within the amniotic fluid. Once such example isZNF238. This transcript is specific to fetal brain tissue and is knownto be vital for cerebral cortex expansion during embryogenesis whenneuronal layers are formed. Loss of ZNF238 in the central nervous systemleads to severe disruption of neurogenesis, resulting in a strikingpostnatal small-brain phenotype. Using methods of the invention, one candetermine whether ZNF238 is presenting in healthy, normal levelsaccording to the stage of development.

Known defects due to the loss of ZNF238 include a striking postnatalsmall-brain phenotype: microcephaly, agenesis of the corpus callosum andcerebellar hypoplasia. Microcephaly can sometimes be diagnosed beforebirth by prenatal ultrasound. In many cases, however, it might not beevident by ultrasound until the third trimester. Typically, diagnosis isnot made until birth or later in infancy upon finding that the baby'shead circumference is much smaller than normal. Microcephaly is alife-long condition and currently untreatable. A child born withmicrocephaly will require frequent examinations and diagnostic testingby a doctor to monitor the development of the head as he or she grows.Early detection of ZNf238 differential expression using methods of theinvention provides for prenatal diagnosis and may hold prognostic valuefor drug treatments and dosing during course of treatment.

Beyond ZNF238, many of the characterized transcripts may hold diagnosticvalue in developmental diseases involving apoptosis, i.e., diseasescaused by removal of unnecessary neurons during neural development.Seeing that apoptosis of neurons is essential during development, onecould extrapolate that similar apoptosis might be activated inneurodegenerative diseases such as Alzheimer's disease, Huntington'sdisease, and amyotrophic lateral sclerosis. In such a scenario, themethodology described herein will allow for close monitoring for diseaseprogression and possibly an ideal dosage according to the progression.

Deducing relative contributions of different fetal tissue types:Differential rate of apoptosis of specific tissues may directlycorrelate with certain developmental diseases. That is, certaindevelopmental diseases may increase the levels of a particular specificRNA transcripts being observed in the maternal transcriptome. Knowledgeof the relative contribution from various tissue types will allow forobservations of these types of changes during the progression of thesediseases. The quantified panel of fetal tissue specific transcriptsduring pregnancy can be considered as a summation of the contributionsfrom the various fetal tissues.

${Expressing},{Y_{i} = {{\sum\limits_{j}{\pi_{i}x_{ij}}} + \varepsilon}}$

where Y is the observed transcript quantity in maternal plasma for genei, X is the known transcript quantity for gene i in known fetal tissue jand E the normally distributed error. Additional physical constraintsincludes:

-   -   1. Summation of all fraction contributing to the observed        quantification is 1, given by the condition: Σπ_(i)=1    -   2. All the contribution from each tissue type has to greater        than or equal zero. There is no physical meaning to having a        negative contribution. This is given by π_(i)≥0, since π is        defined as the fractional contribution of each tissue types.

Consequently to obtain the optimal fractional contribution of eachtissue type, the least-square error is minimized. The above equationsare then solved using quadratic programming in R to obtain the optimalrelative contributions of the tissue types towards the maternal cellfree RNA transcripts. In the workflow, the quantity of RNA transcriptsare given relative to the housekeeping genes in terms of Ct valuesobtained from qPCR. Therefore, the Ct value can be considered as a proxyof the measured transcript quantity. An increase in Ct value of one issimilar to a two-fold change in transcript quantity, i.e. 2 raised tothe power of 1. The process beings with normalizing all of the data inCT relative to the housekeeping gene, and is followed by quadraticprogramming.

As a proof of concept for the above scheme, different fetal tissue types(Brain, Placenta, Liver, Thymus, Lung) were mixed in equal proportionsto generate a pool sample. Each fetal tissue types (Brain, Placenta,Liver, Thymus, Lung) along with the pooled sample were quantified usingthe same Fluidigm Biomark System to obtain the Ct values from qPCR foreach fetal tissue specific transcript across all tissues and the pooledsample. These values were used to perform the same deconvolution. Theresulting fetal fraction of each of the fetal tissue organs (Brain,Placenta, Liver, Thymus, Lung) was 0.109, 0.206, 0.236, 0.202 & 0.245respectively.

Conclusion:

In summary, the panel of fetal specific cell free transcripts providesvaluable biological information across different fetal tissues at once.Most particularly, the method can deduce the different relativeproportions of fetal tissue-specific transcripts to total RNA, and, whenconsidered individually, each transcript can be indicative of theapoptotic rate of the fetal tissue. Such measurements have numerouspotential applications for developmental and fetal medicine. Most humanfetal development studies have relied mainly on postnatal tissuespecimens or aborted fetuses. Methods described herein provide quick andrapid assay of the rate of fetal tissue/organ growth or death on livefetuses with minimal risk to the pregnant mother and fetus. Similarmethods may be employed to monitor major adult organ tissue systems thatexhibit specific cell free RNA transcripts in the plasma.

Example 3: Deconvolution of Adult Cell-Free Transcriptome Overview:

The plasma RNA profiles of 4 healthy, normal adults were analyzed. Basedon the gene expression profile of different tissue types, the methodsdescribed quantify the relative contributions of each tissue typetowards the cell-free RNA component in a donor's plasma. Forquantification, apoptotic cells from different tissue types are assumedto release their RNA into the plasma. Each of these tissues expressed aspecific number of genes unique to the tissue type, and the observedcell-free RNA transcriptome is a summation of these different tissuetypes.

Study Design and Methods:

To determine the contribution of tissue-specific transcripts to thecell-free adult transriptome, a list of known tissue-specific genes wasprepared from known literature and databases. Two database sources wereutilized: Human U133A/GNF1H Gene Atlas and RNA-Seq Atlas. Using the rawdata from these two database, tissue-specific genes were identified bythe following method. A template-matching process was applied to dataobtained from the two databases for the purpose of identifyingtissue-specific gene. The list of tissue specific genes identified bythe method is given in FIG. 18 . The specificity and sensitivity of thepanel is constrained by the number of tissue samples in the database.For example, the Human U133A/GNF1H Gene Atlas dataset includes 84different tissue samples, and a panel's specificity from that databaseis constrained by the 84 sample sets. Similarly, for the RNA-seq atlas,there are 11 different tissue samples and specificity is limited todistinguishing between these 11 tissues. After obtaining a list oftissue-specific transcripts from the two databases, the specificity ofthese transcripts was verified with literature as well as the TisGEDdatabase.

The adult cell-free transcriptome can be considered as a summation ofthe tissue-specific transcripts obtained from the two databases. Toquantitatively deduce the relative proportions of the different tissuesin an adult cell-free transcriptome, quadratic programming is performedas a constrained optimization method to deduce the relative optimalcontributions of different organs/tissues towards the cellfree-transcriptome. The specificity and accuracy of this process isdependent on the table of genes provided in Figure X and the extent bywhich that they are detectable in RNA-seq and microarray.

Subjects: Plasma samples were collected from 4 healthy, normal adults.

Initial Results:

Deconvolution of our adult cell-free RNA transcriptome from microarrayusing the above methods revealed the relative contributions of thedifferent tissue and organs are tabulated in FIG. 13 .

FIG. 13 shows that the normal cell free transcriptome for adults isconsistent across all 4 subjects. The relative contributions between the4 subjects do not differ greatly, suggesting that the relativecontributions from different tissue types are relatively stable betweennormal adults. Out of the 84 tissue types available, the deduced optimalmajor contributing tissues are from whole blood and bone marrow.

An interesting tissue type contributing to circulating RNA is thehypothalamus. The hypothalamus is bounded by specialized brain regionsthat lack an effective blood—brain barrier; the capillary endothelium atthese sites is fenestrated to allow free passage of even large proteinsand other molecules which in our case we believed that RNA transcriptsfrom apoptotic cells in that region could be released into the plasmacell free RNA component.

The same methods were performed on the subjects using RNA-seq. Theresults described herein are limited due to the amount oftissue-specific RNA-Seq data available. However, it is understood thattissue-specific data is expanding with the increasing rate of sequencingof various tissue rates, and future analysis will be able to leveragethose datasets. For RNA-seq data (as compared to microarray), wholeblood nor the bone marrow samples are not available. The cell freetranscriptome can only be decomposed to the available 11 differenttissue types of RNA-seq data. Of which, only relative contributions fromthe hypothalamus and spleen were observed, as shown in FIG. 14 .

A list of 94 tissue-specific genes (as shown in FIG. 15 ) was furtherselected for verification with qPCR. The Fluidigm BioMark Platform wasused to perform the qPCR on RNA derived from the following tissues:Brain, Cerebellum, Heart, Kidney, Liver and Skin. Similar qPCR workflowwas applied to the cell free RNA component as well. The delta Ct valuesby comparing with the housekeeping genes: ACTB was plotted in theheatmap format in FIG. 16 , which shows that these tissue specifictranscripts are detectable in the cell free RNA.

INCORPORATION BY REFERENCE

References and citations to other documents, such as patents, patentapplications, patent publications, journals, books, papers, webcontents, have been made throughout this disclosure. All such documentsare hereby incorporated herein by reference in their entirety for allpurposes.

EQUIVALENTS

The invention may be embodied in other specific forms without departingfrom the spirit or essential characteristics thereof. The foregoingembodiments are therefore to be considered in all respects illustrativerather than limiting on the invention described herein. Scope of theinvention is thus indicated by the appended claims rather than by theforegoing description, and all changes which come within the meaning andrange of equivalency of the claims are therefore intended to be embracedtherein.

What is claimed is:
 1. A method comprising: (a) sequencing nucleic acidmolecules derived from a cell-free blood sample of a pregnant subject todetermine at least one ribonucleic acid (RNA) level of at least onegenomic locus that is differentially expressed in a first population ofsubjects having pre-term birth as compared to a second population ofsubjects not having pre-term birth; (b) computer processing said atleast one RNA level of said at least one genomic locus determined in (a)(i) against at least one reference RNA level of said at least onegenomic locus or (ii) with a trained machine learning algorithm; and (c)determining, based at least in part on said computer processing in (b),that said pregnant subject has an elevated risk of having a pre-termbirth, based at least in part on said computer processing in (c).
 2. Themethod of claim 1, wherein said cell-free blood sample comprises a serumsample or a plasma sample.
 3. The method of claim 1, wherein sequencingsaid nucleic acid molecules comprises reverse transcribing RNA moleculesderived from said cell-free blood sample to produce complementarydeoxyribonucleic acid (cDNA) molecules, and sequencing said cDNAmolecules to determine said at least one RNA level of said at least onegenomic locus.
 4. The method of claim 1, wherein said at least onegenomic locus comprises a tissue-specific differentially expressedgenomic locus.
 5. The method of claim 1, wherein said pregnant subjectis in a first trimester of pregnancy a second trimester of pregnancy, ora third trimester of pregnancy.
 6. The method of claim 1, wherein saidat least one reference RNA level is determined from pregnant subjects ornon-pregnant subjects.
 7. The method of claim 1, wherein processing saidat least one RNA level of said at least one genomic locus against saidat least one reference RNA level further comprises determining adifference between said at least one RNA level of said at least onegenomic locus and said at least one reference RNA level.
 8. The methodof claim 7, further comprising determining a level of fold change inquantitative polymerase chain reaction (qPCR) measurements based atleast in part on data corresponding to said levels of said set of RNAtranscripts and said reference levels to determine said difference. 9.The method of claim 7, further comprising performing principle componentanalysis on data corresponding to said levels of said set of RNAtranscripts and said reference levels to determine said difference. 10.The method of claim 1, wherein said at least one genomic locus comprisesat least two genomic loci selected from the group of genes consisting ofB3GNT2, PPBPL2, PTGS2, U2AF1, CSH1, CAPN6, CYP19A1, SVEP1, PAPPA, andPSG1.
 11. A system comprising: one or more computer processors; and amemory comprising instructions stored thereon that, when executed bysaid one or more computer processors, cause said one or more computerprocessors to perform: (a) sequencing nucleic acid molecules derivedfrom a cell-free blood sample of a pregnant subject to determine atleast one ribonucleic acid (RNA) level of at least one genomic locusthat is differentially expressed in a first population of subjectshaving pre-term birth as compared to a second population of subjects nothaving pre-term birth; (b) computer processing said at least one RNAlevel of said at least one genomic locus determined in (a) (i) againstat least one reference RNA level of said at least one genomic locus or(ii) with a trained machine learning algorithm; and (c) determining,based at least in part on said computer processing in (b), that saidpregnant subject has an elevated risk of having a pre-term birth, basedat least in part on said computer processing in (c).
 12. The system ofclaim 11, wherein said cell-free blood sample comprises a serum sampleor a plasma sample.
 13. The system of claim 11, wherein sequencing saidnucleic acid molecules comprises reverse transcribing RNA moleculesderived from said cell-free blood sample to produce complementarydeoxyribonucleic acid (cDNA) molecules, and sequencing said cDNAmolecules to determine said at least one RNA level of said at least onegenomic locus.
 14. The system of claim 11, wherein said at least onegenomic locus comprises a tissue-specific differentially expressedgenomic locus.
 15. The system of claim 11, wherein said pregnant subjectis in a first trimester of pregnancy a second trimester of pregnancy, ora third trimester of pregnancy.
 16. The system of claim 11, wherein saidat least one reference RNA level is determined from pregnant subjects ornon-pregnant subjects.
 17. The system of claim 11, wherein processingsaid at least one RNA level of said at least one genomic locus againstsaid at least one reference RNA level further comprises determining adifference between said at least one RNA level of said at least onegenomic locus and said at least one reference RNA level.
 18. The systemof claim 17, wherein determining said difference further comprisesdetermining a level of fold change in quantitative polymerase chainreaction (qPCR) measurements based at least in part on datacorresponding to said levels of said set of RNA transcripts and saidreference levels.
 19. The system of claim 17, wherein determining saiddifference further comprises performing principle component analysis ondata corresponding to said levels of said set of RNA transcripts andsaid reference levels.
 20. The system of claim 11, wherein said at leastone genomic locus comprises at least two genomic loci selected from thegroup of genes consisting of B3GNT2, PPBPL2, PTGS2, U2AF1, CSH1, CAPN6,CYP19A1, SVEP1, PAPPA, and PSG1.